NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Genomic delineation and description of species and within-species lineages in the genus Pantoea

https://doi.org/10.3389/fmicb.2023.1254999

Crosby, Katherine C; Rojas, Mariah; Sharma, Parul; Johnson, Marcela A; Mazloom, Reza; Kvitko, Brian H; Smits, Theo_H M; Venter, Stephanus N; Coutinho, Teresa A; Heath, Lenwood S; et al (November 2023, Frontiers in Microbiology)

As the name of the genusPantoea(“of all sorts and sources”) suggests, this genus includes bacteria with a wide range of provenances, including plants, animals, soils, components of the water cycle, and humans. Some members of the genus are pathogenic to plants, and some are suspected to be opportunistic human pathogens; while others are used as microbial pesticides or show promise in biotechnological applications. During its taxonomic history, the genus and its species have seen many revisions. However, evolutionary and comparative genomics studies have started to provide a solid foundation for a more stable taxonomy. To move further toward this goal, we have built a 2,509-gene core genome tree of 437 public genome sequences representing the currently known diversity of the genusPantoea. Clades were evaluated for being evolutionarily and ecologically significant by determining bootstrap support, gene content differences, and recent recombination events. These results were then integrated with genome metadata, published literature, descriptions of named species with standing in nomenclature, and circumscriptions of yet-unnamed species clusters, 15 of which we assigned names under the nascent SeqCode. Finally, genome-based circumscriptions and descriptions of each species and each significant genetic lineage within species were uploaded to the LINbase Web server so that newly sequenced genomes of isolates belonging to any of these groups could be precisely and accurately identified.
more » « less
Full Text Available
LINgroups as a principled approach to compare and integrate multiple bacterial taxonomies

https://doi.org/10.1145/3535508.3545546

Mazloom, Reza; Pritchard, Leighton; Brown, C. Titus; Vinatzer, Boris A.; Heath, Lenwood S. (July 2022, Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)

Traditional taxonomy provides a hierarchical organization of bacte- ria and archaea across taxonomic ranks from kingdom to subspecies. More recently, bacterial taxonomy has been more robustly quanti- fied using comparisons of sequenced genomes, as in the Genome Taxonomy Database (GTDB), resolving down to genera and species. Such taxonomies have proven useful in many contexts, yet lack the flexibility and resolution of a more fine-grained approach. We apply our Life Identification Number (LIN) approach as a com- mon, quantitative framework to tie existing (and future) bacterial taxonomies together, increase the resolution of genome-based dis- crimination of taxa, and extend taxonomic identification below the species level in a principled way. We utilize our existing concept of a LINgroup as an organizational concept for microorganisms that are closely related by overall genomic similarity, to help resolve some of the confusions and unforeseen negative effects of nomen- clature changes of microbes due to genome-based reclassification. Our results obtained from experimentation demonstrate the value of LINs and LINgroups in mapping between taxonomies, translat- ing between different nomenclatures, and integrating them into a single taxonomic framework.
more » « less
Full Text Available
CS5604 (Information Retrieval) Fall 2020 Front-end (FE) Team Project

Cao, Yusheng; Mazloom, Reza; Ogunleye, Makanjuola; Chekuri, Satvik; Fox, Edward A. (December 2020, Information storage and retrieval)
null (Ed.)
With the demand and abundance of information increasing over the last two decades, generations of computer scientists are trying to improve the whole process of information searching, retrieval, and storage. With the diversification of the information sources, users' demand for various requirements of the data has also changed drastically both in terms of usability and performance. Due to the growth of the source material and requirements, correctly sorting, filtering, and storing has given rise to many new challenges in the field. With the help of all four other teams on this project, we are developing an information retrieval, analysis, and storage system to retrieve data from Virginia Tech's Electronic Thesis and Dissertation (ETD), Twitter, and Web Page archives. We seek to provide an appropriate data research and management tool to the users to access specific data. The system will also give certain users the authority to manage and add more data to the system. This project's deliverable will be combined with four others to produce a system usable by Virginia Tech's library system to manage, maintain, and analyze these archives. This report attempts to introduce the system components and design decisions regarding how it has been planned and implemented. Our team has developed a front end web interface that is able to search, retrieve, and manage three important content collection types: ETDs, tweets, and web pages. The interface incorporates a simple hierarchical user permission system, providing different levels of access to its users. In order to facilitate the workflow with other teams, we have containerized this system and made it available on the Virginia Tech cloud server. The system also makes use of a dynamic workflow system using a KnowledgeGraph and Apache Airflow, providing high levels of functional extensibility to the system. This allows curators and researchers to use containerised services for crawling, pre-processing, parsing, and indexing their custom corpora and collections that are available to them in the system.
more » « less
Full Text Available
A Hybrid Domain Adaptation Approach for Identifying Crisis-Relevant Tweets

https://doi.org/10.4018/IJISCRAM.2019070101

Mazloom, Reza; Li, Hongmin; Caragea, Doina; Caragea, Cornelia; Imran, Muhammad (July 2019, International Journal of Information Systems for Crisis Response and Management)

Huge amounts of data generated on social media during emergency situations is regarded as a trove of critical information. The use of supervised machine learning techniques in the early stages of a crisis is challenged by the lack of labeled data for that event. Furthermore, supervised models trained on labeled data from a prior crisis may not produce accurate results, due to inherent crisis variations. To address these challenges, the authors propose a hybrid feature-instance-parameter adaptation approach based on matrix factorization, k-nearest neighbors, and self-training. The proposed feature-instance adaptation selects a subset of the source crisis data that is representative for the target crisis data. The selected labeled source data, together with unlabeled target data, are used to learn self-training domain adaptation classifiers for the target crisis. Experimental results have shown that overall the hybrid domain adaptation classifiers perform better than the supervised classifiers learned from the original source data.
more » « less
Full Text Available
Classification of Twitter Disaster Data Using a Hybrid Feature-Instance Adaptation Approach

Mazloom, Reza; Li, HingMin; Caragea, Doina; Caragea, Cornelia; and Imran, Muhammad (April 2018, Proceedings of the 15th Annual Conference for Information Systems for Crisis Response and Management (ISCRAM))
null (Ed.)
Full Text Available

Search for: All records